Where can I find the BAM file where reads are associated with bead barcodes and/or molecular barcodes?
Where can I find the BAM file where reads are associated with bead barcodes and/or molecular barcodes?
- The Seeker pipeline does not generate this BAM file, where reads are associated with bead or molecular barcodes, by default. However, the association can be generated by merging two of the pipeline’s intermediate outputs (r1-db and r2-db). The output will be a parquet file where each row is a read - read ID and bead barcode sequences are defined for each row.
- Merging of r1-db and r2-db involves running a module named gen-merged-r1-r2-db in the Seeker singularity container, following steps below:
- Find the two intermediate files in the work folder for the step
PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB
in
${root_output_dir}/work/${step_hash}
The beginning of the ${step_hash}of this folder can be identified in this file:
${root_output_dir}/results/pipeline_info/execution_trace_${date}_${time}.txt
Below is an example of the beginning of ${step_hash} (yellow box) for
PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB (red box) in file execution_trace_${date}_${time}.txt
Once you have identified the value boxed in yellow, look for this folder:
${root_output_dir}/work/${step_hash}
Note: The value in yellow is the beginning of the step_hash; use [tab] to find the full path.
Copy your samplesheet.csv to this folder.
Make sure that the ${Sample_ID}-r1-db, ${Sample_ID}-r2-db, and the samplesheet.csv files for your sample of interest can be located in this folder.
Run the command below (in the same folder):
singularity exec ${path_to_curioseekerv2_singularity_container} \ curio-seeker-pipeline \
gen-merged-r1-r2-db \
--samplesheet="${path_to_samplesheet}" \
--sample=${sample_id}
- ${path_to_curioseekerv2_singularity_container}: You can find this path in the nextflow.config file (curioseeker-2.0.0/nextflow.config), as defined by the parameter curio_seeker_singularity.
- ${path_to_samplesheet}: path to the samplesheet.csv you used to process this sample
- ${sample_id}: Sample_ID : used for processing this sample
Example Command:
singularity exec /home/.singularity/curio-seeker-singularity:v2.0.0.sif \
curio-seeker-pipeline \
gen-merged-r1-r2-db \ --samplesheet=/mnt/seeker/work/99/387e66e5dd67a13f/samplesheet.20240115_Mouse_spleen.csv \
--sample=Mouse_spleen
After a successful run, a folder named ${Sample_ID}-r1-r2-merged will be created in the same work folder containing chunked parquet files where each row is a read.
- Read ID is defined in column read1_id
- Bead barcode is defined in column BM
Additionally, only rows with column r1_proper_structure_matched == True, column XS == Assigned should be included.
Troubleshooting:
- If the above command gives this error:
Invalid value for '--samplesheet': Path' samplesheet.csv does not exist
Include --bind flag shown below to fix the issue.
singularity exec \
--bind ${root_samplesheet_folder}
${path_to_curioseekerv2_singularity_container} \
curio-seeker-pipeline \
gen-merged-r1-r2-db \
--samplesheet="${path_to_samplesheet}" \
--sample=${sample_id}
Example Command:
singularity exec --bind /mnt/ /home/.singularity/curio-seeker-singularity:v2.0.0.sif \
curio-seeker-pipeline \
gen-merged-r1-r2-db \ --samplesheet=/mnt/seeker/work/99/387e66e5dd67a13f/samplesheet.20240115_Mouse_spleen.csv \
--sample=Mouse_spleen
Here, the --bind flag allows mounting of a directory (/mnt/) from the host machine into the container, enabling access to the content of the directory by the container.